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1.   INTRODUCTION 

Let  X  ,  X  ,  ...   be  a  sequence  of  independent,  identically  distributed 

random  variables  taking  on  values  in  some  measurable  space  X  .   Let  P -,,...,  P 

-*•       K 

be  a  finite  collection  of  probability  measures  on  X  ,   and  let  H  ,k  =  1,...,K  , 

denote  the  hypothesis  that  the  common  distribution  of  the  X  's  is  P,  . 

n        k 

We  wish  to  associate  with  the  observed  sequence  X..  ,  X  ,  ...   a  sequence  of 

decisions   d,  ,d  ,  ...  ,  d  €  (H1  ,  .  . .  ,H,_}   about  the  true  hypothesis   H.   However, 
12        n    1      K  J  r 

the  decision  d   at  time  n  is  allowed  to  depend  on  X  , . . . ,X   only  through 

a  finite-valued  statistic  T  €   {l,...,m}  ,   which  represents  the  current  state 

n  r 

of  the  memory.   This  statistic  is  updated  after  each  observation,  i.e., 

Tn+1  "  f(W  '    "-1-2 

where   f  :  {l,...,m}  x  X  ->  {l,...,m}   is  a  time-invariant  updating  rule.   The 
decision  d   is  then  given  by 

d  =  d(T  )  ,    n=l,2,...  , 
n      n 

where   d  :  {l,...,m}  ->  {H.,,...,H  }   is  a  time-invariant  decision  rule. 
Let  for  a  given  f  and  d 

P<k>(f,d)  =nmi  f  P<dn*Hk)  (1.1) 

N^KX>       n  =  l 

be  the  asymptotic  expected  frequency  of  incorrect  decisions  if  the  true 
hypothesis  is   H^  .   Our  goal  is  to  find   (f,d)  which  minimizes 


P  (f,d)  =   max     P(k)(f,d)  (1.2) 

e        k=l,...,K   e 


(Thus  we  have  chosen  the  minimax  criterion.   An  alternate  approach  would  be 
to  minimize 

Po(jT;(f,d))  =  I     tt  P(k)(f,d)  , 

6  k=l    ^ 

where      tt   =    (tt    ,  ...,tt    )      is   a   prior   distribution   on      {H    ,  ...,H    }    .       In    this 
J.  K  IK 

report  we  consider  the  former.) 

The  pair  (f,d)  together  with  the  domains  and  ranges  of  the  two  functions 
is  formally  equivalent  to  the  definition  of  a  finite  automaton  (see  e.g.  [1]). 
The  automaton  has   S  =  {l,...,m}   as  its  state  space,   X  as  its  input  space, 
{H, ,...,H  }   as  its  output  space,  and   f   and   d  as  its  state-transition 
function  and  state-output  function  respectively.   If  the  sequence  X  ,X„,... 
of  i.i.d.  random  variables  is  applied  to  the  input  of  such  an  automaton  the 
resulting  sequence  of  states  T  ,T„,...   is  then  a  time-homogeneous  Markov 
chain  with  transition  probabilities 

p..  =  P,  ({x€  X  :  f(i,x)  =  j})  ,    i,j£  S  .  (1.3) 

Hence  the  limit  in  (1.1)  always  exists.   If  the  state-transition  function   f 
is  such  that  the  resulting  chain  is  regular  then  in  fact 

P^k)(f,d)  =  yk(d_1(Hk))  , 

where  y   is  the  stationary  distribution  on  S  .   Throughout  this  paper  we 

K. 

assume  that  this  is  the  case,  i.e.,  we  consider  only-  transition  functions  which 
yield  regular  Markov  chains  under  each  hypothesis. 

Following  Hellman  and  Cover  [3]  we  would  like  to  include  the  possibility 
that  the  transition  function  f   can  be  randomized.   One  way  of  defining   such 
a  randomization  would  be  to  introduce  another  input  sequence  Y  ,Y„,...   of 


-  2 


i.i.d.  random  variables,  independent  of  the  sequence  X  ,X~,...,   and  uniformly 
distributed  on  the  interval   [0,1].   The  transition  probabilities  (1.3)  would 
then  be 

p.  .  =  E.  {p.  .(X)}  ,   where  (1.4) 

p..(x)  =  A({y£  [0,1]  :  f(x,y,i)  =  j})  ,  (1.5) 

A  being  Lebesgue  measure  on  [0,1]. 

However,  we  find  it  more  convenient  to  express  the  randomized  state  transition 
function   f   as  a  pair  (A,  A)   as  follows: 

A  =  {A..  :  i=l,...,m  ;  j=l,...,m  ;  i? j }  , 
where  A. .   are  measurable  subsets  of  X  . 

A=  *6ij  :  i=1»*"»m  5  j=l,...,m  ;  x+ j }  , 
where   6..  >  0  and  T  6..  <  1   for  all  i,j  . 
The  transition  probabilities  (1.5)  if  X  =  x   is  observed  are  now  defined  by 


p   (x)  =6..   whenever  A..?x   for   i^j  , 


Pii(x)  =  1  "  5!  Pii(x)  ' 


and  (1.3)  becomes 


p.  .    =  P.  (A.  .)6.  .   if   i^j  , 


P..    =  1  -  y     p.  . 


(1.6) 


We  will  refer  to  the  triplet   (A, A,  d)   as  randomized  finite  automaton  (RFA) 
and  to  the  set  A  as  randomization. 
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Notice  the  class  of  all  randomization  is  closed  with  respect  to  multi- 
plication of  corresponding  elements,  that  is  if  A  =  {6..}  and  A'  =  {6'..} 
F  ij  ij 

are  randomization  then  A  A'  =  {6.  .6'..}   is  again  a  randomization.   Notice 

also  that  the  sets  A.,   need  not  be  disjoint. 

ij 

We  now  present  a  simple  lemma  to  be  used  in  the  remaining  sections. 
Lemma  1 ;   Let  (A,A,d)   be  a  RFA,  let   u,  ,  k  =  1, . . .  ,K  ,   be  stationary 
distributions  of  the  resulting  Markov  chain  of  states.   Let   R  =  [r   ]  be  a 
K  x  K  matrix  with  positive  entries 

Mk(d"X» 

r. 


ki        nk(d"1(Hk)) 


let   p(A,A,d)   be  the  maximal  eigenvalue  of  R  . 
Then 


P  (A, A  ,d)  >  1  - 


ev  '"  '"'  -     p(A,A  ,  d)  ' 
and  there  exists  a  randomization  A'   such  that 

P  (A,AA\  d)  =  1 ji-\ — jr  • 

e  p(A,A  ,d) 

Proof: 

(1-P  )_1  =  (1-max  P  (k))_1 
k   e 

=  max  (1-P  (k))_1  =  max  (u,  (d"1(H1  )))_1 
.       e  .     k      k 

k  k 

K 
k   £=1  kSL 
since  by  Perron-Frobenius  theorem  the  maximal  eigenvalue  of  a  positive  matrix 
can  never  exceed  the  largest  of  the  row-sums. 
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To  prove  the  second  statement  let  v  =  (v  , . . . ,v  )   be  an  eigenvector 

J.       K 

corresponding  to  p(A,A  ,d)   and  normalized  such  that 


vk  >  0  ,   k  =  1,...,K  ,   v±  +   v2  +  •'•  +  vR  =  a  , 

where  0  <  a  <  1   is  an  arbitrary  constant.   (This  is  always  possible  since  the 
matrix  R  is  positive.)   Define  the  randomization  a'  =  {6!.}  by 

6..  =—  ,    i^i  ,   where  u.  =  v,    whenever   i€d   (H,  )  . 
in    u.     J  l    k  k 

J     l 

(k)         (k) 
Let  p..    and  p! .    be  transition  probabilities  and  y,   and  y'   the  sta- 
ij         13  k        k 

tionary  distribution  of   (A,  A,d)   and   (A, A  A  '  ,d)   respectively.   Then  by 
(1.6)  for  any  k  and   i^j 

,(k)    1   (k) 

p..   -TT7pij  ' 

and  hence  for  any  partition  of  the  state  space   S   into  two  subsets   S   and  S 
we  must  have 

i«s1  jes2  13  k    i€S2  j«s1  1J 

(k)  (k) 

l«S1  j€S2  ui   K     1€S2  j€S1   i 


Thus   u/(i)  ■  C.u.u,  (i)   for  all   i  ,  k  with  C,  >  0   independent  of   i   so  that 

tC  K  1  K  K- 

for  all  k,l   =  1,...,K  .   But  then  for  all  k 
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K  K  v 

^    rk£   =      I    rk£    v  P<A'A'd> 

1=1    ^        £=1    k*    \ 

;ince     v     is   an  eigenvector   of      p(A,A,d) 

Q.E.D. 


2.   UPPER  BOUND  ON  THE  ERROR  PROBABILITY  FOR  THE  3-HYPQTHESES  3-STATE  PROBLEM 
Let  K  =  3  ,   m  =  3  ,   and  let 

P*  -  inf  P  (A, A  ,d)  , 
e        e 

the  infimum  being  taken  over  the  class  of  all  3-state  RFA.   Let  for   i,j  =  1,2,3 

sup  P. (A) /P. (A) 
=  ACX  X  3  (?    n 

Yij    inf  P. (A) /P. (A) 

ACX 


let 


-1    -1    -1 
g  =  (1/3)  (y12  +  Y23  +  Yj_3)  » 


G  =  (l/2)max{Y12  +  y23  »  Y12  +  Y13  »  Y23  +  Y13> 


In  this  section  we  show  that 


P*  <  1  -   (1  +  2g1/2  cosh(l/3   argcosh  G  g   3/2))        .  (2.2) 


We  also  establish  a  simpler  but  looser  bound,  namely 


P*  <  1  -  (1  +  2G1/3)    .  (2.3) 

e 


,-.v 


Notice  that  (2  3)   implies  that  if   Y-.  2  =  Y2o  =  Y13  =  +  °°   then  P  =  0  ,   a  result 
obtained  by  Sagalowicz  in  [4]  and  extended  later  by  Yakowitz  in  [5]. 
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Proof  of  (2.2)  and  (2.3): 

Let  the  letters   i,j,k  stand  for  a  permutation  of  1,2,3,  let  e  >  0 
Let  A   (e)   be  measurable  subsets  of  X  such  that 


let 


P.(A..(e)) 
P  (A  (e)) 


P.(A) 

-  if  pTW  +  e  ' 

A   J 


P. (A. .(e))   P. (A. .(e)) 

Y   (e)  -  _J — U _i — J! 

Yij^    P.(A..(e))   P. (A. .(e)) 


so  that  Y..(e)  +  y.,   as   e  -*  0  .   Consider  a  3-state  RFA   (A, A  ,d)  ,   where 
ij       ij 

A  =  {A..(e)}  ,  a  =  (<5..}   such  that   6   =6   =  0  ,  otherwise  arbitrary,  and 
ij       ^     ij  ik    ki 

d(i)  =  H.  ,  d(j)  =  H.  ,  d(k)  =  H,  . 

1  J  K 


P(A..)6.. 


P(A.,)6., 

jk  jk 


cr©„  _@; 


■©2 


kj 


The  stationary  distribution  of  the  resulting  Markov  chain  of  states  are 
(see  Appendix  A) 

M(l)  =  cP(A..)«..P(Ak.)Sk.  , 

MO)  -  CP(A..)«.  .P(Ak.)6k.  , 

M(k)  =  cP(A.k)6.kP(A..)S..  , 

where   c   is  a  normalizing  constant.  (The  epsilon  has  been  dropped  temporarily 

to  ease  the  notation.)   The  matrix  R  of  Lemma  1  is  given  by 
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R  = 


P.  (A.  .)6.  . 
'      P  .  (A  .  . )  6  .  . 


P. (A..  )P.(A..)6..  6.. 
i      jk      i      n      jk   ij 

P.(A..)P.(A.  .)6.  .6.  . 
i      ji      l      kj      ji   kj 


P. (A..) 6.. 

_J 3  J-      Ji 

P. (A. .)6. . 


WWVki        pk<Aki>6ki 
WVWu     '   WV 


P.(A.,)6.. 
]      1k      1k 


W6 


kj 


and  its  characteristic  eouation  has  the  form 


(1-A)      -    (l-X)C     +  D     =  0      , 


(2.4) 


where 


-  ,  P,  (A..  )P  -CA.JP.  (A.  .)P.  (A..) 

_1    f<-\   +  ^/vw    i     Jk     i     ij     k     ji     k  "kj 


C     =  y,.    (e)   +  y.,  (e)  + 


ij    "'        Yjk^      P^AjPP^^   )Pfc(A     -)Pk(A     ) 


and 


P.(A..)P.(A.,  )P,  (A..)P,  (A,  .) 
D     =      i      i.l      .1      lk     k      -ii     k    Xi 

e        P.(A,.)P.(A.  .)P.  (A.,  )P    (A..) 
x     ji     j      kj      k     jk     k      ij 


P.(A..)P,  (A,  .)P.(A.,  )P.(A..) 
+      j      Ji      k      kj      l      jk      l      ii 

P.(A..)P.  (A.,  )P.(A..)P.(A^.) 
j      ij      k      jk      l      ji      l     kj 


Now  D   can  be  written  as 

e 


D  = 


V°VE)  Pj(A  )Pk(Ajfe) 


+  Y-i(£)T-i(E)!i^V^ 

+  T«(,)V)F1Ujk)P1C*kJ) 
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and  hence 


Ds  *  Yj£(e)Yij(£)Yjk  +  YiJ(OY'i(e)Y±j  • 
Next  writing  C  as 

where 

P.  (A  .)P.(A.,  )P.(A,  ,)P,  (A.  .)P.  (A.  ,)P  (A.,  ) 

jv  *        P.(A..)P.(A.  .)P.(A.,  )P.  (A..  )P.  (A..)P,  (A.,)  ' 
J       1  ji  1  Vj   i  ik  k  jk  k  ij  k  ik 

it  is  seen  that  by  setting   i,j,k  equal  to  the  three  cyclic  permutations  of 
1,2,3  we  must  have 

F1(e)F2(e)F3(e)  =  1  . 

Hence   i,j,k  can  be  chosen  such  that 

Ce  "  Y12(e)  +  Y23(e)  +  Y13(e)  * 

Now  it  is  easily  verified  that  the  maximal  root  of  the  equation  (2.4),  which  is 

real  and  not  smaller  than  1,  is  an  increasing  and  continuous  function  of  both  the 

coefficients   C   and  D   .   Thus  by  Lemma  1  there  is  for  every  e  >  0  a  3-state 
z  z 

RFA  for  which  the  error  probability 

P   <  1  -  r"1  +  e  ,  (2.5) 

e 

where   r  is  the  maximal  root  of  the  equation 

(1-X)3  -  (1-A)CQ  +  DQ  =  0  ,  (2.6) 


with 


.      "I  .   "I     "I 

C0  =  Y12  +  Y23  +  Y13  , 
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and 

DQ  =  max{Y  ^  +  y23  ,  y12  +   y^  ,  Y23  +  Y13>  ■ 

Clearly  (1/3)CQ <  (1/2)DQ   and  since  (l/2)DQ  <  1   we  must  have  ((1/3)C0)3  <  ((1/2)DQ)2 
Hence  the  maximal  root  of  the  cuhic  equation  (2.6)  is  given  by 

r  =  1  +  2  ((1/3)C0)1/2  cosh(l/3)<f>  , 

-3/2 
where  cosh  <|>  =  1/2  D_((1/3)C0)     and  the  bound  (2.2)  follows  from  (2.5).   The 

3  2 

simplified  bound  (2.3)  can  be  obtained  by  increasing  Cn  until  ((1/3)  C-J   =((1/2)D_) 

the  maximal  root  of  (2.6)  thus  becoming 

r  =  1  +  2((1/2)DQ)1/3  . 


3.   COUNTEREXAMPLE  TO  TREE-CONJECTURE. 

Consider  a  K-hypotheses  problem  and  assume  for  simplicity  that  the  support 
of  each  of  the  distributions  P   is  same.   With  each  RFA   (A,A,d  )   we  can  now 
associate  a  graph   r  with  vertices  corresponding  to  states  of  the  RFA  and  with 
an  arc  joining  vertices   i  and  j   if  and  only  if  p.. p..  ^  0  .   (This  property 
does  not  depend  on  the  hypothesis  because  of  our  assumption.) 

Let   e  >  0  ,   C   be  the  class  of  all  e-optimal  RFA,  i.e.,  all  m-state  RFA 
(A, A  ,d)  such  that 

P  (A,  A  ,d)  <;  inf  P  (A, A  ,  d)  +  e  . 
e  e 

It  has  been  conjectured  by  Cover  [2]  that  for  every   e  >  0  the  class   C   always 
contains  a  RFA  whose  graph  T      is  a  tree.   This  is  indeed  true  for  K  =  2   ([3]) 
and  a  plausible  heuristic  argument  can  be  given  for  such  a  structure  even  for 
K  >  2  .   Unfortunately,  as  we  are  going  to  show  in  this  section,  the  conjecture 
is  false  already  for  K  =  3  .   We  do  this  by  exhibiting  a  nontrivial  3-hypotheses 

-  10  - 


problem  and  a  3-state  RFA  with  a  triangular  graph  T    ,   which  is  strictly 
better  than  any  3-state  RFA  whose  graph  is  a  tree. 

Let  X  =  {1,2,3,4,5,6}  ,  let  p,q,r,s  be  positive  numbers  such  that 


and 


2p+2q+r+s=l    , 


1   <  —  <<  XL 

s  q 


Define  the  three  distributions   p,,p~,Po   as  follows 


Pk(x) 

X 

1 

2 

3 

4 

5 

6 

1 

P 

q 

P 

q 

s 

r 

k 

2 

q 

p 

r 

s 

P 

q 

3 

s 

r 

q 

p 

q 

p 

Consider  now  a  RFA  (A, A  ,d)   with  the  state  space  S  =  {1,2,3}  ,  d(i)  =  H   ,  i€S 
and  the  graph  r   the  tree   © — (g>— (3)  .   The  matrix  R  for  this  RFA  is  the  same 
as  that  on  page  8  with  (i,j,k)  =  (1,2,3)  .   Since  by  Lemma  1  the  error  probability 
is  determined  by  the  maximal  eigenvalue  p   of   R  and  p   is  always  at  least  as 
large  as  the  smallest  row-sum,  in  order  to  minimize  p  we  are  forced  to  choose 
A  as  follows: 


{2}  ,  A01       ,  A9^    {6}  ,  A32  =  {5}  . 


12 


21 


23 
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The  matrix  R  then  becomes 


R  = 


q621 


P  « 


q612 

err  623  612 

P  621 

'   PS  621  532 

1 

q  623 

12 


P  6 


32 


qs  _21  J132 

pr  623  612 


q  632 


P  6 


23 


Writing  its  characteristic  equation  again  as 


we  have 


(l-\r  -  (l-A)C  +  D  =  0 


C  =  3(^)2  .  D-  (i)3  (i  +  f) 


(3.1) 


There  are  two  other  3-state  RFA  whose  graph  r  is  a  tree,  one  with  the  graph 

^2) Q (3)  and  one  with  the  graph   Q (5) ®  .   By  the  same  reasoning 

as  before  we  are  forced  to  choose 


A   =  {2}  ,  A_.  =  {1}  ,  A   =  {4}  ,  aoi  =  {3} 


21 


13 


31 


for  the  former,  and 


A   =  {4}  ,  A..  =  {3}  ,  A00  =  {6}  ,  A00  =  {5} 


13 


31 


23 


32 


for  the  latter.   The  matrices   R  are 


1 

'  p 

q 
'   p 

q 
p 

,  i   , 

qs 
Pr 

q 

£1 

1 

p 

ps   ' 

-  12  - 


and 


1   , 


qs 


ps 


1   , 


3. 

P 


,   1 


respectively,  where  we  omitted  the  6's  for  the  sake  of  simplicity.  Hence 
the  coefficient  of  their  characteristic  equations  are  again  given  by  (3.1). 
Now  consider  a  3-state  RFA  (A,  A  ,d)  with 


A12  =  {2}  ,  A21  =  {1}  ,  A13  =  {4}  ,  A31  =  {3}  ,  A^  =  {6}  ,  A32  =  {5} 


6..  =  1/2  for  all  ijtj   and  d(i) 
is  a  triangle.   The  matrix  R  is 


q  P+2r 
p  p+r+s 


H.  ,  i  =  1,2,3  .   The  graph  r  of  this  RFA 


il  p+2s     jc[  p+2r 
p  p+r+s   *   p  p+r+s 


SL   P+2s 
p  p+r+s 


il  P+2s     £  p+2r 
p  p+r+s   '   p  p+r+s   ' 


and  the  coefficients  of  its  characteristic  equation 


£)2  p+2r  p+2s   =  3(3a2(1  _  (  r-s  }2 
L        JV   p+r+s  p+r+s    JV  U    ^p+r+s;  ;  ' 


and 


V  LVp+r+s'    v p+r+s'  J 
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Comparing  these  expressions  with  (3.1)  we  see  that  C      n   <  C      and 
r  triangle    tree 

with   r  and   s   suitably  chosen  also  D      n   <  D     .   (Choose,  for 

triangle    tree 

-1  -3  4 

instance  r  =  10  p  ,  s  =  10  p  .   Then  D    /D   .      =  10  ) .   Since  the 

tree   triangle 

maximal  eigenvalue  increases  with  both  C  and  D  we  conclude  from  Lemma  1 
that  the  best  "tree"  RFA  has  an  error  probability  strictly  larger  than  this 
"triangular"  RFA. 

4.   MINIMAX  THEOREM  FOR  FINITE-MEMORY  PROBLEMS. 

Let   tt  =  (tt  ,  .  .  . ,  7T  )   be  a  probability  distribution  on  the  set  of  hypotheses, 
I       K. 

let   (A, A  ,d)   be  RFA,  and  let  this  time  the  error  probability  be 

P  (i;(A,A  ,d))  =  I   tt  pfk)(A,A  ,d)  . 
e  k=l  k  e 

Looking  now  at  the  problem  as  a  two-person  zero-sum  game,  where  the  1st  player 

(Nature)  chooses  jr_  and  the  2nd  player  (Statistician)  chooses  (A, A  ,d)   it 

is  natural  to  ask  whether 

inf   sup  P  Ot_;(A,A  ,d))  =  sup    inf   P  (jr;(A,A,d))        (4.1) 
(A, A  ,d)  jr   e  "  tt   (AjA  ,d)  e  ' 

Now  if  K  -  2   then  it  is  known   [3]   that 

m-1  I/2 
2(7T  TT  y         )     -  1 

inf  P  (tt,(A,A  ,d))  -  X  -, (4.2) 

e  —  m-l   1 

Y12   "  X 
where  y  is  given  by  (2.1)  .   Hence 

(Y"-1)1/2  _  !  

sup  inf  P  =   12   , =  1  -  (1  +7-^-t  )  _1 

e       m-l  v  m-l 

Y12   "  Y12 
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On  the  other  hand  by  Lemma  1, 

inf  sup  P  =  1  -  (sup  p(A,A,d)) 

and  for  K  =  2   it  is  easily  seen  that 

-1         -1      !/2 
/y  (d  i(H9))y9(d  i(H1))  \ 

p(A,A  ,d)  =  1  +(  1   _    l 2   _    l 

Vy^d  i(H1))y2(d  ±(H2))  / 

However  it  has  also  been  shown  in  [3]  that 


ul(d"1(fV);,2(fVi,)    i 

sup  — 


M1(d  1(H1))y2(d  1(H2))   y^1 

and  hence  (4.1)  is  indeed  true  for  K  =  2  . 

Conjecture:   (4.1)  is  also  true  for  K  >  2  . 

Comment :   Since  an  analog  of  (4.2)  for  K  >  2   is  not  available  at  present  the 

above  reasoning  cannot  be  applied  to  prove  the  conjecture.   However,  since  the 

number  of  hypotheses  is  finite  (4.1)  would  follow  if  one  could  show  that  the 

set  of  all  vectors 

(P(1)(A,a  ,d),...,P(K)(A,A  ,d))  , 
e  e 

where   (A, A  ,d)   runs  through  all  m-state  RFA,  is  convex.   This  is  indeed  so  for 
K  =  2  ,   unfortunately  we  have  not  been  able  to  prove  this  even  for  K  =  3  . 
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APPENDIX  A 

A  Formula  For  A  Stationary  Distribution 
Of  A  Finite  Markov  Chain. 
Let  P  ■  [p..]   be  an  m  *  m.  stochastic  matrix,  let   g  =  (S,E)   be  an 
oriented  graph  with  the  set  of  vertices   S  =  {l,...,m}   and  the  set  of  arcs 
EC  S  x  s  defined  by 

(i,j)€  E<=>i£j   and  p   >  0  . 

Let   i€S  be  a  vertex  of   g.   Then  a  vertex  j  6  S   such  that   (i,j)€  E 

is  called  a  successor  of   i  .   A  sequence  of  vertices   (i- ,i_, . . . ,i  )   such 

that  each  ii,-,   is  a  successor  of   i   ,  k  =  l,...,n-l  ,   is  called  a  path. 

If   in   is  also  a  successor  of   i   the  path   (i1}...,i  )   is  called  a  cycle. 
I  n  1      n  — 

Consider  now  a  subgraph  f  =  (S,F)  ,   where  FC  E  with  the  following 
properties. 

1)  each  vertex  i€S  has  at  most  one  successor. 

2)  f  has  no  cycles. 

3)  f   is  maximal,  i.e.,  no  further  arcs  can  be  added  without  violating 
1)  or  2). 

We  will  call  such  a  subgraph  a  confluence.  Notice  that  each  confluence 
has  exactly  one  vertex  with  no  successor.  We  will  refer  to  this  vertex  as  a 
sink. 

With  each  confluence   f  =  (S,F)   we  associate  a  positive  number 


p(f)  =   n   P 

(i,j)€F  1J 


We  now  have  the  following  theorem: 
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Theorem:   Let  P  be  a  transition  probability  matrix  of  a  homogeneous  Markov 

chain,   g  be  the  graph  defined  above,  let   cf> .   be  the  set  of  all  confluences 

with  sink   i  €  S  . 

If   P  has  an  invariant  distribution   (y1,...,y  )  =  (y  ,...,y  )P   then 

1      m      1      m 


y   =  C   I     p(f)  ,    ±€S    , 

f  €4>± 


(A.l) 


where  C  >  0   is  a  normalizing  constant  determined  from  u,  +  ...  +  u   =  1. 

1  m 

Remark:  Notice  that  the  formula  (A.l)  gives  y.  as  a  sum  of  products  of  the 
off-diagonal  entries  of  P  ,  each  product  contains  exactly  m  -  1  different 
entries  and  no  two  products  contain  the  same  set  of  p..'s  .   In  this  sense, 

the  representation  of   y .   is  unique.   Notice  also,  that  if  all  off-diagonal 

•^         ...         .         r  ,     m-2      , 

entries  of  P  are  positive  then  y.   is  a  sum  of  exactly  m     products. 

Thus,  although  the  formula  is  certainly  of  theoretical  interest,  its  application 

for  computing  the  stationary  distribution  is  likely  to  be  limited  to  cases, 


where  a  majority  of  p..?s  are  zero. 


Example:   Let 


P  = 


.5  .2 
.3  .1 
.1    .7 


.3    0 

.2    .4 
.2    0 


0    .5    0    .5 


The  graph   g   is 


and  the  confluences  together  with  the  number  p(f)   are  as  follows 
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Sink  Confluences      f€  <J>  £   p(f) 

i 


1 


124  12  4  124 

.015  .105  .010  .130 


124  1-2-4  124 

.010  .070  .105  .185 


124  124  124 

.045  .030  .020  .095 


124  124  124 

.008  ,056  .084  .148 

Total:  .558 

.......  /   130        185  95        148   >> 

Hence  the  stationary  distribution  is   y_  =  (  TTW   >  "7777  >   777  »  77q  '  ' 
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Proof  of  the  Theorem:   We  will  show  that  (A.l)  satisfy  the  equations 

m 
V*    =   I   H±P±1  »    j=l,...,m  , 
J    i=l     J 


or  equivalently 

m         m 

U.  I      P.,  =   I   y,P,,  ,    j=l,...,m  .  (A. 2) 

J  i=l   J1    i-1  X   1J 

i^j       i^j 
Let  h  =  (S,H)   be  an  arbitrary  subgraph  of  g  ,   let 

p(h)  =    n   p 

(i,j)€H  1J 

and  let  h  +  (i,j)   denote  a  subgraph  obtained  from  h  by  adding  or  removing 
the  arc  (i,j).   Next  let 

A.  =  {f  +  (j,i)  :  £€   «f>.,i€  S  -  {j}}  , 

B  =  {f  +  (i,j)  :  f€  <j>.,i€S  -  {j}}  . 

If  u  ,...,u    is  given  by  (A.l)  then  for  any  j£  S 

m 

u .   £   p.  .  =  £  p(h)    and 
2    i=l   1J   h€A. 
1«         J 


m 

I     U±P±1  =  I   p(") 
i=l     3   h€B. 

1«  J 


Thus  (A. 2)  will  follow  if  we  show  that  A.  =  B.  .   To  this  end  let 

h€A  ,  h  -  f  +  (j,i)  , 
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let  k  be  a  vertex  contained  in  the  path   (i,...,j)  whose  successor  is  j  . 
If  the  arc   (k,j)   is  removed  then  h  becomes  a  conluence  with  sink  k 
since   (k,j)  was  an  arc  of  confluence   f   and  thus  could  not  have  any  other 
successor  than  j  .   Hence  h   can  be  written  as   f  +  (k,j)  ,  f'€  $,   and 
therefore  h€B.  .   Conversly,  if  h£B.  ,  say  h  =  f  +  (k,j)  ,   then 
f'f  4>,   and  by  removing  the  arc   (j,i)   with  i  being  a  successor  of   j   con- 
tained in  the  path   (j,...,k)   we  conclude  that  h  -  (j,i)€<j>..   Hence  h€A. 
and  the  proof  is  complete. 
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APPENDIX  B 
A  Generalization  of  a  Lemma  of  Yakowitz  ([5]). 

Lemma:   Consider  K  finite  regular  Markov  chains  with  state  spaces   S,  , 
transition  probabilities   [P  (i+j)]  ,   and  stationary  distributions  u_k  , 
k  =  1,...,K  .   Link  these  chains  together  by  allowing  transition  between  S, 
and   Sk+1  ,  k  =  1,...,K  -  1  ,   via  a  pair  of  states  e^k+±  6    Sk  ,  ek+1^6   S  k+]_ 


with  probabilities 


P(ek,k+1  *  ek+l,k}  =  \,k+l  ' 

p(ek+i,k  +  ek,k+i)  =  Vi,k  ' 

and  changing  the  original  transition  probabilities  Pk<ek  k+1  "*■  efc  k+±)      and 

Pk+l(ek+l,k  +   Vl.k*   accordingly. 

If  the  new  chain  with  state  space  S  =  S1  U  ♦ • •  U  SK  is  regular  then  its 
stationary  distribution  u_  is  given  by 

k-1 

K 

n   tt.   -y.(e.  .  .)  ,       k  =  1,...,K  , 

j=k+l  3  >J-1  3   J>J-1 

where  C  >  0   is  a  normalizing  constant. 

Proof   (by  induction  on  K) 

(i)   Let  K  =  2  .  We  have  for  the  original  two  chains 

s€  S1^y1(s)  =  I     P1(r->s)u1(s)  , 
reSj^ 
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s(    S ,=*« ,(s)  =  I       P,(r-^)p  (s)  , 
r€S2 


and  for  the  new  chain  S  „  =  S  U  S„  , 


s€  S1,s   J  e12=^>y12(s)    =      J       P    (r+s)  u12(s)    » 

r€Sl 

s€  S2>s  ^   e21^=^y12(s)    =      £       P2  (r->s)  y12(s)    , 

r€S 

M12(e12)  -I     P1(r->s)y12(s)  +  [PCe^-^)  -  *12]u12<e12) 


r€Sr{ei2} 


+^21y12(e21)   =     I       P  (t^8)p     (.) 

r€S1 

since      "^21^12  ^e2p    =   TT12yi2^e12^      by  ecluat:Lng   flows.      Similarly 

y12(e21)  =      I       P2(r->s)y12(s)       . 
r€S2 

Hence  if   s  €  S.   then   y  (s)   and  y   (s)   satisfy  the  same  system  of  linear 


equations  and  consequently 


s€S1=^y  2(s)  =  a1y1(s)  , 


s6S   =>y12(s)    =  a  y2(s) 
In  particular 


y12(e12)    =  alPl(e12)    ,    ^(e^)    =  ^2^21> 


and   since     7Ti?lii2^el2^    =  7r21y12^e21^      we  muSt  haVe 


al        *21   y2(e21} 


a2        7r12   ul(e12) 
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Since  a  +  a^  =   1   this  implies  a^   =  Cl^1^2^e21^  '  a2  =  C  \2yl^e12^  * 

(ii)   Let  the  lemma  be  true  for  S  =  S  U-.-U  S     and  form  new  chain  SUS 

J.  K.- 1.  K 

Denoting  _y   the  stat.  distribution  of   S   and  _y   that  of   SUS   we  have 
by  part  (i) 

s  €  S  =»y(s)  =  C  ttk  k.^j^k  K-l^y'  ^ 
s£SK=j>y(s)  =  C  \_ljKu'(eK_1)K)yK(s) 

By  induction  hypothesis 

K-2 
^Vl.K*  =  CVl^K-l,^  .\   "j.j+l  yj  (ej  ,3+l) 


(B.l) 


and  if   s  £  S  C  S 
k 


k-1  K-l 


y'(s)  =  C'u  (s)  n  ir      u  (e    -)   II  tt      y  (e     ) 

Substitution  into  (B.l)  gives  the  desired  formula  (with  the  proportionality 
constant   CC1). 
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